Construction of the Turkish National Corpus (TNC)

نویسندگان

  • Yesim Aksan
  • Mustafa Aksan
  • Ahmet Koltuksuz
  • Taner Sezer
  • Ümit Mersinli
  • Umut Ufuk Demirhan
  • Hakan Yilmazer
  • Gülsüm Atasoy
  • Seda Öz
  • Ipek Yildiz
  • Özlem Kurtoglu
چکیده

This paper addresses theoretical and practical issues experienced in the construction of Turkish National Corpus (TNC). TNC is designed to be a balanced, large scale (50 million words) and general-purpose corpus for contemporary Turkish. It has benefited from previous practices and efforts for the construction of corpora. In this sense, TNC generally follows the framework of British National Corpus, yet necessary adjustments in corpus design of TNC are made whenever needed. All throughout the process, different types of open-source software are used for specific tasks, and the resulting corpus is a free resource for non-commercial use. This paper presents TNC’s design features, web-based corpus management system, carefully planned workflow and its webbased user-friendly search interface.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Associative Measures and Multi-word Unit Extraction in Turkish

Associative measures are “mathematical formulas determining the strength of association between two or more words based on their occurrences and cooccurrences in a text corpus” (Pecina, 2010, p. 138). The purpose of this paper is to test the 12 associative measures that Text-NSP (Banerjee & Pedersen, 2003) contains on a 10-million-word subcorpus of Turkish National Corpus (TNC) (Aksan et.al., 2...

متن کامل

Thai National Corpus: A Progress Report

This paper presents problems and solutions in developing Thai National Corpus (TNC). TNC is designed to be a comparable corpus of British National Corpus. The project aims to collect eighty million words. Since 2006, the project can now collect only fourteen million words. The data is accessible from the TNC Web. Delay in creating the TNC is mainly caused from obtaining authorization of copyrig...

متن کامل

A Contrastive Study of Theme in English and Azerbaijani Turkish Fictional Texts

Thematisationis one of the troublesome areas both for translation purposes from or into English and also for learning EFL. The main reason for the problem lies in the fact that usually different languages structure thematisation in different ways. Therefore, the present research is an attempt to investigate contrastively: experiential (topical), interpersonal and textual themes in a sample of A...

متن کامل

Cultural Influence on the Expression of Cathartic Conceptualization in English and Spanish: A Corpus-Based Analysis

This paper investigates the conceptualization of emotional release from a cognitive linguistics perspective (Cognitive Metaphor Theory). The metaphor weeping is a means of liberating contained emotions is grounded in universal embodied cognition and is reflected in linguistic expressions in English and Spanish. Lexicalization patterns which encapsulate this conceptualization i...

متن کامل

Word-Forming Process in Azeri Turkish Language

The subject intended to study the general methods of natural word-forming in Azeri Turkish language. This study aimed to reach this purpose by analyzing the construction of compound Azeri Turkish words. Same’ei (2016) did a comprehensive study on word-forming process in Farsi, which was the inspiration source of this study for Azeri Turkish language word-forming. Numerous scholars had done vari...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012